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KbiitAOLcX'.      The  determination  of  pattern  recognition  rules  is 
viewed  as  a  problem  of  computer  induction,  under  the  guidance 
of  g&H2A(ltizcuticm  H.uZoJ>   and  rules  representing  knowledge  of 
the  recognition  problem  at  hand.   The  paper  formulates  the 
underlying  theory  for  generalization  and  optimization  of  des- 
criptions of  object  classes,  expressed  in  the  form  of  decision 
rules.   The  language  for  formulating  descriptions  is  an  extension 
of  the  first  order  predicate  calculus, called  the  variable-valued  logic 
calculus  VL21.   ^21  contains  several  new  syntactic  forms,  specially 

oriented  for  expressing  inductive  processes.  The  presented 
approach  uniformly  combines  descriptors  (variables,  predicates, 
functions)  of  three  different  types:  nominal,  linear  and  structured, 
and  has  an  ability  to  generate  new  descriptors  not  used  in  the 
initial  data  rules. 
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1 .   INTRODUCTION 

A  pattern  recognition  rule  can  be  viewed  as  a  rule 

DESCRIPTION  ::>  RECOGNITION  CLASS  (1) 

which  assigns  a  situation    (an  object,  a  process,  etc.)  to  the  RECOGNITION 
CLASS,  when  the  situation  satisfies  the  DESCRIPTION.   In  the  decision  space 
approach  the  DESCRIPTION  is  in  the  form  of  an  analytical  expression  involving 
a  set  of  numerical  variables  selected  a  priori.  Variables  spanning  the 
decision  space  are  treated  uniformly,  are  usually  assumed  to  be  measured  on 
at  least  an  interval  scale,  and  are  desired  to  be  relevant  and  independent 
characterics  of  the  objects.   When  the  variables  are  strongly  interconnected 
and  the  relevant  object  characteristics  are  various  relations  among  the 
variables,  or  among  parts  or  subparts  of  objects,  then  the  decision  space 
approach  becomes  inadequate.   In  such  situations  the  structural  approach 
can  be  useful. 

In  the  structural  approach,  the  DESCRIPTION  is  a  formal  grammar 
(usually  a  phrase-structure  grammar)  in  which  terminals  are  certain  elementary 
parts  of  objects,  called  'primitives'.   The  types  of  relationships  which  can 
be  expressed  "naturally"  in  terms  of  a  formal  grammar  are,  however,  quite 
limited.   If  the  relevant  characteristics  include,  for  example,  some  numerical 
measurements  in  addition  to  relations  and  symbolic  concepts,  then  grammars 
involving  them  are  very  cumbersome  or  inadequate. 

This  is  a  strong  limitation,  because  in  many  problems  an  adequate 

class  description  requires  both  numerical  characterizations  of  objects  and  a 

specification  of  various  relationships  among  properties  of  objects,  of  object 

parts,  logical  conditions  on  properties,  etc.;  i.e.,  involve  descriptors   of 

mixed  arity  and  measured  on  different  scales.   Thus,  there  is  a  need  for  a 
method  which  could  handle  simultanously  all  such  desciptors. 
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Both  the  decision  space  approach  and  the  syntactic  approach  tend 
to  produce  descriptions  which  are  not  easily  comprehensible  by  humans.   This 
is  so  because  these  descriptions  do  not  directly  correspond  to  the  'natural 
language  type'  descriptions  which  human  experts  would  develop  observing 
the  same  data  and  which  they  would  normally  like  to  use.   Although  in  some 
applications  'human  comprehensibility '  may  not  be  important,  in  other  ap- 
plications (e.g.,  in  expert  computer  consulting  systems)  it  is  a  crucial 
requirement. 

This  paper  presents  results,  still  early  and  limited,  of  an 
attempt  to  develop  a  uniform  conceptual  framework  and  an  implementation 
method  which  would  satisfy  both  of  the  above  requirements.   In  addition, 
an  important  aspect  of  this  method  is  that  the  final  descriptions  which 
it  produces  may  involve  new  descriptors  (variables  or  relations)  which 
were  not  included  in  the  initial  characterization  of  objects.   This  is 
achieved  through  the  application  of  'metarules'  which  represent  the  under- 
lying knowledge  of  the  problem  at  hand  and  of  the  properties  of  descriptors 
used  in  formulating  the  descriptions  of  exemplary  data.   Therefore,  the 
approach  taken  is  in  the  spirit  of  research  in  artificial  intelligence.   The 
method  uses  logic  as  the  basic  formal  framework  (specifically,  a  certain 
syntactic  extension  of  the  first  order  predicate  calculus,  called  variable-valued 
logic  system  VL91),  and  is  most  closely  related  to  the, body  of  work  termed 
'computer  induction'.   The  ability  to  develop  new  descriptors,  in  addition 
to  those  given  a  priori,  places  this  work  in  the  category  of  what  we  call 
'constructive  induction'*  as  opposed  to  'non-constructive  induction',  in 
which  the  final  descriptions  relate  only  descriptors  initially  provided. 


*  The  author  thanks  Larry  Travis  of  the  University  of  Wisconsin  for 
suggesting  this  name. 
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2.   RELATED  RESEARCH 

It  would  be  a  very  difficult  task,  requiring  more  space 
than  provided,  to  characterize  adequately  various  important  con- 
tributions to  computer  induction.   We  will  make  here  only  a  very 
limited  and  certainly  not  adequate  review  of  some  more  recent  works. 

Many  results  consider  inductive  tasks  within  a  specific 
problem  domain.   For  example,  programs  collectively  called  METADENDRAL 
[1]  use  a  model-directed  heuristic  search  to  determine  rules  that 
describe  themolecular  structure  of  an  unknown  chemical  compound  from 
mass  spectrometry  data.   In  [2]  Winston  describes  a  method  for  deter- 
mining a  graph  description  of  simple  block  structures  from  examples. 
A  program  developed  by  Lenat  [3]  generates  concepts  (represented  as 
collections  of  a  priori  defined  properties)  of  elementary  mathematics, 
under  the  guidance  of  a  large  body  of  heuristic  rules.   Soloway  and 
Riseman  [4]  describe  a  method  for  creating  multi-level  descriptions 
of  a  part  of  a  baseball  game,  starting  with 'snapshots'  of  the  game, 
and  using  rules  representing  general  knowledge  of  the  game. 

The  programs  such  as  those  mentioned  above  usually  incorporate 
a  large  body  of  task-specific  knowledge  and  tend  to  perform  quite  well 
on  tasks  they  were  designed  for.   They represent  an  important  achieve- 
ment and  demonstrate  again  that  high  performance  requires  specialized 
solutions.   An  important  problem  which  they  raise,  however,  is  how 
to  untangle  and  systematize  the  ideas  which  they  contribute* in  order 
to  extend  understanding  of  inductive  processes  at  large,  and  to  apply  in 
other  problem  areas. 
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A  significant  part  of  research  has  been  concerned  with 
determining  patterns  in  sequences  of  symbols  (e.g.,  Simon  [5], 
Waterman  [6]).   Simon  [5]  found  that  descriptions  of  such  patterns 
consistently  incorporate  onlyafew  basic  relations:   'some'  and  'next' 
between  symbols,  iterations  between  subpatterns,  and  hierarchic 
phrase  structure.   Gaines  [7]  developed  a  method  for  generating 
finite-state  automata,  which  approximate  a  given  symbol  string,  and 
represent  different  trade-offs  between  the  complexity  and  poorness- 
of-fit.   Shaw,  Swartout  and  Green  [8]  developed  a  program  for 
inferring  Lisp  code  from  a  set  of  examples  of  Lisp  statements. 

The  above  works  are  related  to  the  general  subject  of 
grammatical  inference  (i.e.,  inference  of  a  grammar  which  may  have 
produced  a  given  set  of  strings) .   Early  work  in  this  area  was  con- 
cerned with  the  inference  of  a  phrase  structure  grammar  (e.g.,  Feldman 
et  al  [9]).  More  recent  work  moves  into  inferring  'multi-dimensional' 
grammars  (e.g.,  work  by  Brayer  and  Fu  [10]). 

In  the  recent  years  there  has  been  a  new  trend  toward  the 
development  of  general  methods  of  induction. 

Michalski  and  his  collaborators  (e.g.,  [11,  12,  13])  have 
developed  a  methodology  (using  a  sentencial  calculus  with  discrete 
variables,  called  variable-value  logic  system  VL, 5as  a  formal  basis) 
and  computer  programs  for  determining  generalized  and  optimal  in  some 
sense  discriminant  descriptions  of  classes  of  objects  from  examples. 
The  examples  are  presented  as  sequences  of  values  of  discrete  variables 
with  an  associated  recognition  class.   Work  in  a  similar  spirit,  al- 
though more  limited  in  scope, was  reported  by  Stoffel  [14]  (the  elementary 
statements  used  there  are  restricted  to  the  'variable-value'  forms,  i.e., 
to  'elementary  selectors',  as  described  in  Section  4). 
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Many  authors  use  a  restricted  form  (usually  quantifier- 
free)  of  the  first-order  predicate  calculus  (FOPC)  or  some  equiva- 
lent notation  as  the  formal  framework  for  formulating  hypotheses.   Morgan 
[15]  describes  a  formal  method  of  hypothesis  generation,  called  f- 
resolution,  which  stems  from  deductive  resolution  principles.   Various 
theoretical  issues  of  induction  in  FOPC  were  considered  by  Plotkin  [16]. 
Fike,  Hart  and  Nilsson  [17]  describe  an  algorithm  for  generalizing 
robot  plans.   Hayes-Roth  and  McDermott  (e.g.,  [18]),  also  Vere  [19], 
describe  methods  and  computer  programs  for  generating  conjunctive 
descriptions  of  least  generality  (which  they  call  'maximal  abstractions'), 
of  a  set  of  objects  represented  by  products  of  n-ary  predicates.   The  rules 
of  generalization  which  they  use  can  be  characterized  as  'dropping  a 
condition'  and  'turning  constants  into  variables'  (see  section  5.3).  Related, 
but  different  in  spiri^  is  work  by  Zagoruiko  [20]  on  a  general  method 
for  'strengthening  hypotheses'   by  narrowing  the  uncertainty 
intervals  of  values  of  output  variables,  and  work  by  Hedrick  [21]  on 
determining  production  systems  using  a  semantic  net  of  predefined  concepts. 

This  paper  presents  a  theoretical  framework  for  generalizing 
and  optimizing  descriptions  of  object  classes  in  the  form  of  decision 
rules.   The  decision  rules  can  involve  descriptors  of  three  different  types 
(nominal,  linear  and  structured),  employ  some  new  syntactic  forms,  and  use 
problem  knowledge  for  guiding  induction  and  generating  new  descriptors. 
The  formal  notation  is  a  modification  and  extension  for  FOPC,  called 
variable-valued  logic  system  VL   .   This  formalism  is  claimed  to  be  more 
adequate  than  the  traditional  FOPC  as  a  conceptual  framework  for  describing  the 
inductive  processes  under  consideration.   The  paper  is  an  extension 
and  modification  of  the  report  [22],  and  stresses  the  conceptual  principles 
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of  induction  method  rather  than  specific  algorithms  and  implementation 
details.   Most  of  the  latter  are  described  in  [23,  24,  25]. 

3.   PROBLEM  STATEMENT 

A  VL  transformation  rule  is  defined  as  a  rule 


DESCRIPTION.- dj>  DESCRIPTION2  (2) 


where  DESCRIPTION,  and  DESCRIPTION  are  expressions  in  VL?1  system  (section  4), 
f   ^> stands  for  various  transformation  operators  which  define  the  meaning 
of  the  rule. 
A  DESCRIPTION  may  look  like: 
ax. ,x  [on-top(x  ,x  )  ] [size(x  )=3. .5] [color (x)=blue, yellow, red]  A 

[length(x.)  .  length(x  )=small] 
(For  explanation  of  notation  see  section  4) . 

We  will  consider  here  the  following  transformation  operators: 
(i)  ::>   the  operator  defines  a  decision  rule.      DESCRIPTION 

specifies  a  decision  (or  a  sequence  of  decisions)  which  is 
assigned  to  a  situation  which  satisfies  DESCRIPTION. . 

(In  the  application  to  pattern  recognition,  DESCRIPTION- 
defines  the  recognition  class.) 
If  a  situation  does  not  satisfy  the  DESCRIPTION  ,  the  rule  assigns 
to  it  a  NULL  decision. 

(ii)    =*  the  operator  defines  an  inference  rule.      If  a  situation 

satisfies  DESCRIPTION  ,  the  rule  assigns  the  truth-status  'TRUE' 
to  DESCRIPTION-,  otherwise  the  truth-status  of  DESCRIPTION 
is  '?'.   (In  an  inference  rule  DESCRIPTION   is  called  the 
condition     and  DESCRIPTION  is  called  the  consequence. 
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A  decision  rule  can  be  viewed  as  a  special  case  of  an  inference 
rule,  namely,  when  DESCRIPTION  is  a  constant,  an  elementary  selector, 
or  a  product  of  elementary  selectors  involving  decision  variables  (see 
def.  2),  also,  when  its  truth-status  is  TRUE  (in  general,  it  may  be  not  TRUE). 

(iii)   |^  the  operator  defines  a  generalization  rule,   which  states 
that  the  DESCRIPTION^  is  more  general   than  DESCRIPTION  , 
i.e.,  the  set  of  situations  which  satisfy  DESCRIPTION- 
is  a  superset  of  the  set  of  situations  satisfying  DESCRIPTION  . 
(iv)  |=  the  operator  specifies  an  equivalence  preserving  trans- 
formation rule    (when  the  above  mentioned  sets  are  equal) . 
The  rule  is  a  special  case  of  a  generalization  rule. 

The  problem  considered  in  this  paper  is  defined  as  follows: 
•  Given  is 

(a)   a  set  of  VL  decision  rules,  called  data  rules,  which 

specify  initial  knowledge, {C .  .} ,  about  some  situations 
(objects,  processes,  ...)  and  the  recognition  class, 
K.,    associated  with  them: 


C±1::>K±,  C12::>K±        C^    ::  >  K± 

Cn    ::>  K2,  C22    ::>  K2        C U2    ::>  ^ 


(3) 


C   -  ::>  K   ,    C   0  ::>  K  C   ,      ::>  K 

ml      m      mz      m  mtm  m 
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(b)  a  set  of  VL  inference  rules  which  define  a  problem 
environment,    i.e.,  represent  knowledge  about  the 
recognition  problem  under  consideration.   This  includes 
value  sets  of  descriptors  used  in  the  data  rules,  the 
properties  of  descriptors  and  their  interrelationships 
characteristic  to  the  problem  at  hand. 

(c)  a  preference   (or  optimality)   criterion,   which  for  any 
two  'comparable'  sets  of  decision  rules  specifies  which 
one  is  more  preferable,  or  states  that  they  are  equally 
preferable. 

•  The  problem  is  to  determine,  through  an  application  of  generalization 
rules  (  sec.  5.3),  a  n^w  set  of  decision  rules  (called  output  rules   or 
hypotheses) : 


ck  ::>  V    cii  ■■■■>Kv  ■■■  CU  ::>% 


G21  * 


ml 


>  Kr        C<21    ::>  Kr      ...  C ^    ::>  ^ 


>  K  ,       C\   ::>  K   ,   ...  C        ::>  K 

m     m2      m         mrm  m 


(A) 


which  are  most  preferable  among  all  sets  of  rules  that  do  not  contradict  the 
problem  environment  rules,  and  with  regard  to  the  input  rules  are  consistent 
and  complete. 

The  output  rules  are  consistent   with  regard  to  input  rules,  if  for 
any  situation  to  which  the  input  rules  assign  a  non-NULL  decision,  the  output 
rules  assign  to  it  the  same  decision,  or  the  NULL  decision. 
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Th  e  output  rules  are  complete   with  regard  to  input  rules,  if  for  any 
situation  to  which  the  input  rules  assign  a  non-NULL  decision }  the  output 
rules  also  assign  to  it  a  non-NULL  decision. 

It  is  easy  to  see   that  if  the  output  rules  are  consistent  and 
complete  with  regard  to  the  input  rules   then  they  are  semantically  equivalent 
(i.e., assign  the  same  decision  to  the  same  situation)   or  more  general  than 
the  input  rules  (i.e.,  they  may  assign  a  non-NULL  decision  to  situations  to 
which  the  input  rules  assign  a  NULL  decision). 

From  a  given  set  of  data  rules  it  is  usually  possible  to  derive 
many  different  sets  of  rules  which  are  consistent  and  complete  and  which 
satisfy  the  problem  environment  rules.   The  role  of  the  preference 
criterion  is  to  select  one  (or  a  few  alternative  sets  of  rules)  which  is 
most  desirable  in  the  given  application.   The  preference  criterion 
may  refer  to  the  simplicity  of  the  rules  (defined  in  some  way) ,  their 
generality,  the  cost  of  measuring  the  information  needed  for  rule 
evaluation,  degree  of  approximation  to  the  given  facts,  etc.  (section 
5.4).   In  this  paper  we  accept  the  restriction  that  the  DESCRIPTIONS, 
C  •  ■   and  C.   are  disjunctive  simple  VL91  expressions  (section  4).   Such 
expressions  have  a  very  simple  interpretation,  and  seem  to  be  sufficient  for 
many  applications. 
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4.   VL  EXPRESSIONS  AS  DESCRIPTIONS 
4.1  Definition  of  VL 

Data  rules,  hypotheses,  problem  environment  descriptions, 
and  generalization  rules  are  all  expressed  using  the  same  formalism, 
that  of  variable-valued  logic  calculus  VL?1 .  *  VL91  is  an  extension 
of  predicate  calculus  designed  to  facilitate  a  compact  and  uniform 
expression  of  descriptions  of  different  degrees  and  different  types 
of  generalization.   The  formalism  also  provides  a  simple  linguistic  inter- 
pretation of  descriptions   without  losing  the  precision  of  the  con- 
ventional predicate  calculus.   To  make  the  paper  self-contained,  we 
will  provide  here  a  brief  description  of  VL91 . 

There  are  three  major  differences  between  VL»,  and  the  first 
order  predicate  calculus: 

1.   In  place  of  predicates,  it  uses  selectors   (or  relational 
statements)    as  basic  operands.  A  selector,  in  the  most 
general  form,  specifies  a  relationship  between  one  or 
more  atomic  functions  and  other  atomic  functions  or 
constants.  A  common  form  of  a  selector  is  a  test  to 
ascertain  whether  the  value  of  an  atomic  function  is  a 
specific  constant  or  is  a  member  of  a  set  of  constants. 
The  selectors  represent  compactly  certain  types  of 
logical  relationships  which  can  not  be  directly  represented 
in  FOPC  but  which  are  common  in  human  descriptions.   They 
are  particularly  useful  for  representing  changes  in  the  degree 
of  generality  of  descriptions  and  for  syntactically  uniform 
treatment  of  descriptors  of  different  types. 


*VL91  is  a  subset  of  a  more  complete  system  VL»  under  development 
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2.  Each  atomic  function  (a  variable,  a  predicate,  a  function) 

is  assigned  a  value  set  (domain),  from  which  it  draws  values, 
together  with  a  characterization  of  tne  structure  of  the  value  set. 

This  feature  facilitates  a  representation  of  the  semantics 
of  the  problem  and  the  application  of  generalization  rules  appropriate 
to  the  type  of  descriptors. 

3.  An  expression  in  VL^-.   can  have  a  truth  status:   TRUE,  FALSE  or 
?  (UNKNOWN) . 

The  truth-status  '?'  provides  an  interpretation 

of  a  VL91  description  in  the  situation,  when,  e.g.,  outcomes  of 

some  measurements  are  not  known. 

Definition  1:   An  atomic  function   is  a  variable,  or  a  function  symbol  followed 
by  a  pair  of  parentheses  which  enclose  a  sequence  of  atomic  functions 
and/or  constants.   Atomic  functions  which  have  a  defined  interpretation 
in  the  problem  under  consideration  are  called  descriptors. 

A  constant   differs  from  a  variable  or  a  function  symbol  in  that 
its  value  set  is  empty.  If  a  confusion  is  possible,  a  constant  is  typed 
in  quotes. 

Examples 

Constants        2  *  red 

Atomic  forms:   x   color(red)   on-top(pl,p2)  ( (x  ,  g(x  )) 

Exemplary 

value  sets:   D(x  )  =  {0,  1,...,  10} 

D(color)  =  {red,  blue,...} 

D (on-top}  =  {true,  false} 

D(f)  =  {0,1,...,  20} 

Definition  2:  A  selector   is  a  form 

[L  #  R]  (5) 

where   L  -  called  referee^  is  an  atomic  function,  or  a  sequence  of  atomic 

functions  separated  by  ' . ' .   (The  operator  ' . '  is  called  the  internal 

conjunction. ) 
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//  -  is  one  of  the  following  relational  operators: 

=  =j:   >   <   >   < 

R  -  called  reference,  is  a  constant  or  atomic  function,  or  a 
sequence  of  constants  or  atomic  functions  separated  by  operator 
1  , '  or  '  .  .  '  .   (The  operators  ' , '  and  ' . . '  are  called  the 

internal  disjunction,    and  the  range  operator 3    respectively) . 

A  selector  in  which  the  referee  L  is  a  simple  atomic  function  and 
the  reference  R  is  a  single  constant  is  called  an  elementary  selector.   The 
selector  has  truth-status  TRUE  {or  FALSE}  with  regard  to  a  situation  if  the 
situation  satisfies   {does  not  satisfy]   the  selector,  i.e.,  if  the  referee  L 
is  {is  not}  related  by  #  to  the  reference  R.   The  selector  has  the  truth- 
status  '?'  (and  is  interpreted  as  being  a  question),    if  there  is  not  sufficient 
information  about  the  values  of  descriptors  in  L  for  the  given  situation.   To 
simplify  the  exposition,  instead  of  giving  a  definition  of  what  It  means  that 
'L  is  related  by  //  to  R' ,  we  will  simply  explain  this  by  examples.   (See  section 
section  5.1  for  more  details). 


(i)  [color(boxl)  =  white] 

(ii)  [length(boxl)  >  2] 

(iii)  [weight (boxl)  =  2.. 5] 

(iv)  [blood-type  (PI)  =  O.A.B] 

(v)     [on-top (boxl,  box2)  -  T] 
or  simply 
[on-top (boxl,  box2)] 

(vi)    [above (boxl,  box2)  =  3,r] 

(viii)  [weight (boxl)  >  weight  (box3)] 


color  of  boxl  is  white 

length  of  boxl  is  greater  than  or  equal  to  2 

weight  of  boxl  is  between  2  and  5, 

blood-type  of  PI  is  0  or  A  or  B 

boxl  is  on  top  of  box2 

box  1  is  3"  above  box2 

the  weight  of  boxl  is  greater  than  the 
weight  of  box3 


(ix) 
(x) 


[length(boxl)  .  length  (box2)  =  3]    the  length  of  boxl  and  box2   is  3 


[type(P;L)  .  type  (P2)  =  A,B] 


the  type  of  P  and  the  type  of  Y 


is  either  A  or  B. 
Note  the  direct  correspondence  of  the  syntactic  forms  to  linguistic 
descriptions.   Note  also  that  some  selectors  can  not  be  expressed  in  FOPG 
in  a  (pragmatically)  equivalent  form  (e.g.,  (iv) ,  (ix) ,  (x) ) . 
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A  VL?1  expression  (or,  here,  simply  VL  expression)  is  defined  by 
the  following  rules: 

(i)      A  constant  TRUE,  FALSE  or  '?'  is  a  VL  expression 
(ii)     A  selector  is  a  VL  expression 

(iii)    If  V,  V  and  V  are  VL  expressions  then  so  are: 
(V)  formula  in  parentheses 

— I  V  inverse 

V  A  V  or  V  V    conjunction 

V  V  V-  disjunction 

V..  V  V_  exclusive  disjunction 

1—2  J 

V  \|  V  exception 

V  t=£>V„  metaimplication 

wherecO-e  {+,*»,  ::>,=>,  |g  ,  (=  } 

(implication,  equivalence,  decision  assignment, 
inference,  generalization,  semantical  equivalence) 

3x1 ,x_, . . . ,x  (V)   existentially  quantified  expression 

Vx  , x  , . . . ,  x,  (V)   universally  quantified  expression 

A  VL  formula  can  have  truth-status  TRUE  (T) ,  FALSE  (F)  or  UNKNOWN(?). 
The  interpretation  given  to  connectives  ~|»  A,  V,  -*-,  is  defined  in  Fig.  1.   (This 
interpretation  is  consistent  with  Kleen-Korner  3-valued  logic) .   An  expression 
with  the  operator  =*,  |<  or  f=  is  assumed  to  always  have  the  truth-status  TRUE 
and  with  operator  ::>,  TRUE  or  ?.   Operators\  ,  V_,  and  <*  are  interpreted: 

V  \  V    is  equivalent  to  V  (IVJ 

V  V_  V   is  equivalent  (V  V  V  )\  V  V 

V  «  V   is  equivalent  to  (V  ->V  )  (V  -^V  ) 

The  truth-status  of 

r 

TRUE  {FALSE}   if,  in  a  given  situation,  there  exists 

{does  not  exist}  a  value  of  x  which  makes 

3x(V)  is     /  the  truth-status  of  V  equal  TRUE 

?  if  it  is  not  known  whether  there  exists  .  .  . 
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Figure  1 


Vx(V) 


is 


TRUE  {FALSE}    if  for  every  value  of  x  in  a   given  situation, 
the  truth-status  of  V  is  {is  not}  TRUE 

?  if  it  is  not  known  whether  for  every  .  .  . 


A  constant  *  ('irrelevant')  is  introduced  to  substitute  for  R,  in 
a  selector  [L  =  R],  when  R  is  the  sequence  of  all  p.ossible  values  the  L  can 
take . 

A  VL  expression  in  the  form 

QF1,QF2,...  (V1      v  P2v,..vP1)  (7) 

where  QF  .  is  a  quantifier  f  orm  3x..  ,x_,  .  . .  or  Vx.,x9,...  and  P.  is  a  con- 
junction of  selectors  (a  term),    is  called  a  disjunctive  simple   VL  expression 
(a  DVL   expression) . 
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5.    INFERENCE  AND  GENERALIZATION  RULES 
5. 1  Interpretation  of  Inference  Rules 

An  inference  rule 

DESCRIPTION    ■*   DESCRIPTION  (8) 

is  used  by  applying  it  to  situations.      A  situation   is,  in  general,  a  source 
of  information  about  values  of  variables  and  atomic  functions  in  DESCRIPTION., 
(the  condition  part   of  the  rule).   A  situation  can,  e.g.,  be  a  data  base 
storing  values  of  variables  and  procedures  for  evaluating  atomic  functions, 
or  it  can  be  an  object  on  which  various  tests  are  performed  to  obtain  these 
values. 

A  decision  rule  is  viewed  as  a  special  case  of  an  inference  rule, 
when  DESCRIPTION  (the  consequence   or  decision  part   of  the  rule)  is  a  con- 
stant, an  elementary  selector,  or  a  product  of  elementary  selectors  involving 
decision  variables    (i.e.,  the  DESCRIPTION  uniquely  defines  a  decision  or 
a  sequence  of  decisions) .   The  truth  status  of  the  condition  and  decision 
part  of  a  rule,  before  applying  it  to  a  situation,  is  assumed  to  be  UNKNOWN. 

Let  Q  denote  the  set  of  all  possible  situations  under  consideration. 
To  characterize  situations  in  Q,  one  determines  a  set  S,  called  the  descriptor 
set3      which  consists  of  variables,  predicates  and  atomic  functions  (called, 
generally,  descriptors)   whose  specific  values  can  adequately  characterize 
(for  the  problem  at  hand)  any  specific  situation.   We  will  assume  here 
that  theaiguments  of  atomic  functions  are  single  variables,  rather 
than  other  atomic  functions.   A  situation  is  characterized  by  an  event   which 
is  a  sequence  of  assignments  (L:=v),  where  L   is  a  variable  or  an  atomic  function 
with  specific  values  of  arguments,  and  v  is  a  value  of  the  variable  or  atomic 
function  which  characterizes  the  situation.   It  is  assumed  that  each  descriptor 
has  defined  a  value  set  (domain)  which  contains  all  possible  values  the 
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descriptors  can  take  for  any  situation  in  Q.   Certain  descriptors  may  not 

be  applicable  to  some  situations  and  therefore  it  is  assumed  that  a 

descriptor  in  such  cases  takes  value  NA,  which  stands  for  not  applicable. 

Thus,  the  domains  of  all  descriptors  always  include  by  default  the  value 

NA.  The  set  of  all  possible  events  for  the  given  descriptor  set  S  is  called 

the  event  space,    and  denoted  &(S).   It  should  be  noted  that  within  a  single  event 

certain  variables  (variables  which  are  quantified  in  formulas)  may  be  assigned 

a  number  of  different  values,  i.e.,  there  may  be  more  than  one  pair  (L:=v.), 

where  L   is  a  variable  and  v.,  i  =  1,  2,  ...  represent  different  values. 

An  event  e  £  £(S)  is  said  to  satisfy  a  selector  [f (x.  , . . . ,x^)  #  R] 
iff  the  value  of  function  f  for  values  of  x. ,  i  =  1,  2,  ...,  k,  as  specified 
in  the  event  e,    is  related  to  R  by  //.   For  example,  the  event 

e:    (. .  .x5:=al5  x6:=a  ,  f2Q(a1,  a^    :=  5,  ...) 
satisfies  the  selector: 

[f20(x5'x6^  =  1»  3>  5^ 
A  satisfied  selector  is  assigned  truth-status  TRUE.   If  an  event 

does  not  satisfy  a  selector  then  the  selector  is  assigned  truth-status  FALSE. 

If  an  event  does  not  have  enough  information  in  order  to  establish  whether  a 

selector  is  satisfied  or  not   then  the  selector  has  UNKNOWN  truth-status 

with  regard  to  this  event. 

Let  us  assume  first  that  the  condition  part  of  an  inference  rule  is 

a  quantifier-free  formula.   Interpreting  the  connectives  -J  ,  A,   V ,  as 

described  in  figure  1,  one  can  determine  from  the  truth  status  of  selectors 

the  truth-status  of  the  whole  formula.   An  event  is  said  to  satisfy   a  rule, 

iff  an  application  of  the  condition  part  of  the  rule  to  the  event  gives  the 

formula  truth-status  TRUE.   Otherwise,  the  event  is  said  to  not  satisfy 

the  rule. 
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Suppose  now  that  the  condition  formula  is  in  the  form 
3x(V) 
An  application  of  this  formula  to  an  event  assigns  status  TRUE  to  the  formula 
iff  there  exists  in  e   a  value  assigned  to  x  such  that  V  achieves  status  TRUE 
(x  may  have  a  number  of  different  values  assigned  to  it).   For  example,  the 
formula 

apart  [color  (part)  =  red] 
is  satisfied  by  the  event: 

e  =  (...  part:=Pl,  color  (Pl):=blue,  part:=P2,  color  (P2) :=yellow, 
part:=P3,  color  (P3) :=red. . .) 
If  the  condition  part  is  a  form 

VX(V) 
then  it  is  assigned  status  TRUE  if  every  value  of  x  in  the  event  applied  to 
it  satisfies  V. 

If  the  condition  part  assumes  truth-status  TRUE  then  the  decision 
part  is  assigned  status  TRUE.   When  the  decision  part  reaches  status  TRUE 
then  variables  and  functions  which  occur  in  it  are  assumed  to  have  values 
which  make  this  formula  TRUE.  These  values  may  not,  in  general,  be  unique. 

For  example,  suppose  that  V  is  a  decision  part  with  status  TRUE: 
V:   [p(xrx2)  =  2][x3  =  2:5][x5=7] 
V  is  interpreted  as  a  description  of  a  situation  in  which  p  has  value  2  (if  a 
specification  of  p(x..,x  )  is  known,  then  from  it  we  can  infer  what  values  of 
x..  and  x_  might  be),  x„  has  a  value  between  2  and  5,  inclusively,  and  x,  has 
value  7.   (Note  that  the  formula  does  not  give  precise  information  about  the 
value  of  x„.)  After  applying  a  formula  to  an  event,  the  truth  status  of  the 
condition  and  decision  part  returns  to  UNKNOWN.   The  role  of  an  inference  rule 
can  then  be  described  as  follows:   the  rule  is  applied  to  an  event,  and  if  the 
event  satisfies  the  condition  part,  then  an  assignment  of  values  to  variables 
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and  functions  is  made  as  defined  by  the  decision  part.   This  assignment 
defines  a  new  event  (or  a  set  of  events  which  satisfy  the  decision  part) . 
Another  inference  rule  now  can  be  applied  to  this  event  (or  set  of  events), 
and  if  satisfied  by  it  (or  by  all  of  them) ,  a  new  assignment  of  values  to 
some  variables  and  functions  can  be  made. 
Examples  of  VL  inference  rules: 
[p(xrx2)  =  3][q(x2)  =  2,5]  [x?  +  0]  "♦  [d(yx)  =  7][p(yi,y2)  =  2] 
Hx3([p(x1,x3)  =  2..3][q(x7,x3)  >  2])  V  [t(x±)   =»  1]  «•  [dCy^  =  7] 
TRUE  ^  [p(x2,x7)  =  2][x?  =  2,3,5] 

5. 2   Specification  of  the  problem  environment  in  the  form  of  inference  rules 

Types  of  descriptors 

The  process  of  generalizing  a  description  depends  on  the  type  of 
descriptors  used  in  the  description.  The  type  of  a  descriptor  depends  on  the 
structure  of  the  value  set  of  the  descriptor.  We  distinguish  here  among  three 
different  structures  of  a  value  set: 

1 .  Unordered 

Elements  of  the  domain  are  considered  to  be  independent 
entities,  no  structure  is  assumed  to  relate  them.   A 
variable  or  function  symbol  with  this  domain  is  called 
nominal   (e.g.,  blood-type). 

2.  Linearly  Ordered 

The  domain  is  a  linearly  ordered  set.   A  variable  or 
function  symbol  with  this  domain  is  called  linear 
(e.g.,  military  rank,  temperature,  weight). 

3.  Tree  Ordered 

Elements  of  the  domain  are  ordered  into  a  tree  structure. 
A  predecessor  node  in  the  tree  represents  a  concept  which 
is  more  general  than  the  concepts  represented  by  the 
dependent  nodes  (e.g.,  the  predecessor  of  nodes  'triangle, 
rectangle,  pentagon,  etc.'  may  be  a  'polygon').  A  variable 
or  function  symbol  with  such  a  domain  is  called  structured. 
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Each  descriptor  (  a  variable  or  fuction  symbol  )  is  assigned 

its  type  in  the  specification  of  the  problem.  In  the  case  of  structured 

descriptors,  the  structure  of  the  value  set  is  defined  by  inference  rules 
(e.g.,  see  eqs.  (13) , (14) , (15)) . 

In  addition  to  assigning  to  each  variable  and  function  symbol  a  domain, 
one  defines  properties  of  variables  and  atomic  functions  characteristic  for  the 
given  problem.   They  are  represented  in  the  form  of  inference  rules.    Here  are 
a  few  examples  of  such  properties. 

1.   Restrictions  on  Variables 

Suppose  that  we  want  to  represent  a  restriction  on  the  event 
space  saying  that  if  a  value  of  variable  x-  is  0  ('a  person 
does  not  smoke'),  then  the  variable  x„  is   'not  applicable' 
(x„  -  kind  of  cigarettes  the  person  smokes) .   This  is  repre- 
sented by  a  rule: 

[x±   =  0]  =>   [x3  =  NA] 

NA  =  not  applicable 

2.   Relationships  Between  Atomic  Functions 

For  example,  suppose  that  for  any  situation  in  a 
given  problem,  the  atomic  function  f(x..,  x?)  is 
always  greater  than  the  atomic  function  g(x  ,  x?) . 
We  represent  this: 


T  =>  Vxp  x2  [f(xr  x2)  >  g(xr  x2)] 


3.   Properties  of  Predicate  Functions 

For  example,  suppose  that  a  predicate  function  is  transitive, 
We  represent  this: 

\/x1,x2,x3([left(x1  ,x2)]  [left(x2,x3)]  =*  [leftU^x^  ]) 

Other  types  of  relationships  characteristic  for  the  problem 
environment  can  be  represented  similarly. 
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5.3.  Generalization  rules 

In  order  to  transform  data  rules  (3)  into  hypotheses  (4) , 
generalization  rules  are  applied  to  data  rules.  A  generalization  rule 
transforms  one  or  more  decision  rules  associated  with  the  same  general- 
ization class    (which,  in  our  case,  is  the  same  as  recognition  class) , 
into  a  new  decision  rule,  which  is  equivalent  to  or  more  general  than 
the  initial  rules. 

A  decision  rule 

V   ::>  K  (9) 

is  equivalent   to  a  set  of  decision  rules 

{V±    ::>  K},    i  =  1,  2,  ...  (10) 

if  any  event  which  satisfies  at  least  one  of  the  7.,  1=1,  2,  ..., 
satisfies  also  V,    and  conversely.   If  the  converse  is  not  required,  the 
rule  (9)  is  said  to  be  more  general  than   (10) . 

The  generalization  rules  are  applied  to  data  rules  under  the 
condition  of  preserving  consistency  and  completeness,  and  achieving 
optimality  according  to  the  preference  criterion.   A  basic  property  of  a 
generalization  transformation  is  that  the  resulting  rule  may  have  UKNOWN 
truth-status  (is  a  hypothesis) ;  its  truth-status  has  to  be  tested  on 
new  data. 

Below  is  a  list  of  a  few  basic  generalization  rules  (K   denotes 
a  generalization  class) . 
Non-constructive  rules: 

(i)   the  extending  reference   rule 

V[L   =  R  ]  ::>  K   \<    V[L   =  R2]  ::>  K 
where  L  -  is  an  atomic  function 

R^l     R- ,  and  R  ,R„  are  subsets  of  the  value  set,  D(L), 
of  descriptor  L. 
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V   -  an  arbitrary  description  (here  a  VL  expression) . 
This  is  a  generally  applicable  rule;  the  type  of  descriptor 
L  does  not  matter, 
(ii)  The  dropping  selector   (or  dropping  condition   )  rule 

V[L   =  R]  ::>  K   |<  V   ::>  K 
This  rule  is  also  generally  applicable.   It  is  one  of 
the  most  commonly  used  rules  for  generalizing  information. 
It  can  be  derived  from  rule  (i) ,  by  assuming  that  R~  in 
(i)  is  equal  the  value  set  D(L).   In  this  case  the  selector 
[L  =  R~]  has  always  truth-status  TRUE,  and  as  such  can 
be  removed, 
(iii)   The  closing  interval   rule 

V[L   =  a]  ::>  K 

V[L   =  b]  ::>  K 
This  rule  is  applicable  only  when  L  is  a  linear  descriptor. 


(  V[L   =  a..b]  ::>  K 


To  illustrate  the  rule,  consider  as  objects  two  states  of  a 
machine,  and  as  a  recognition  class,  a  characterization  of  the  states  as 
normal.      The  rule  says  that  if  the  states  differ  only  in  that  the  machine 
has  two  different  temperatures,  say,  a   and  b,  then  the  hypothesis  is  made 
that  all  states  in  which  the  temperature  is  in  the  interval  [a,b]   are. 
also  normal. 

(iv)  The  climbing  generalization   tree  rule 

r 

V[L  =   a]  ::>  K 


one  or 

more 

rules 


V[L  =   b]  ::>  K         /         [L  =  s  ]  ::> 

« 

^[L  =  i]  ::>  K 


V 
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where  L  is  a  structured  descriptor 

s  -  represents  the  predecessor  node  (a  concept  at  the 
next  'level  of  generality')  of  nodes  a,b,...and  i, 
the  tree  domain  of  L. 
The  rule  is  applicable  only  to  selectors  involving  structured 
descriptors.  This  rule  has  been  used,  e.g.,  in  [2],  [3],  [21]. 
Example: 

V[ shape (p)=triangle  ]   ::>  K 
7[shape(p)=rectangle]  ::>  K 
(v)  The  extension  against   rule 


(.      K[shape(p)=polygon]  ::>  K 


V±[L   =  Rx]  ::>  K 
V2[L   =  R2]  ::>-»£ 


C   [L  +  R2]  ::>  K 


where  R  rs   R  =  0 

V1  and  V  -  arbitrary  descriptions. 
This  rule  is  of  general  applicability.   It  is  used  to  take  into 
consideration  'negative  examples',  or,  in  general,  to  maintain 
consistency.   It  is  a  basic  rule  for  determining  discriminant 
class  descriptions. 


one  or 

more 

rules 


(vi)  The  'turning  constants  into  variables  '   rule 

r 


7[p(a,Y)]  ::>  K 
7[p(b,Y)]  ::>  K 

7[p(i,Y)]  ::>  K 


C         7[p(x,Y)]::>  K 


V 


where  Y  stands  for  one  or  more  arguments  of  atomic 
function  p. 
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x  is  a  variable  whose  value  set  includes  a,b,...,i. 
This  is  a  rule  of  general  applicability.   It  is  the  basic 
rule  used  in  works  on  induction  employing  predicate 
calculus. 
Constructive  Rules; 

Constructive  rules  generate  descriptions  of  the  data  rules  in 
terms  of  certain  new  descriptors,  and,  therefore,  are  a  form  of  generali- 
zation rules.   They  also  can  be  viewed  simply  as  rules  which  generate  new 
descriptors  ( 'metadescriptors ' ) .   There  can  be  very  many  such  rules. 
We  will  restrict  ourselves  here  to  two  examples.   Some  constructive  rules 
are  encoded  as  specialized  procedures. 

(vi)the  counting   rule 

7J attribute  (P _)=A]. . . [attribute  (P  )=A] [attribute  (P)4A] . . . 


.  .[attribute^P^f  a]    |<     V[#P-attribute  -A=k]  ::> 


K 


where  P  ,P_, . . . ,P  , . . . ,P    -  are  constants  denoting,  e.g., 

parts  of  an  object 

attribute  -  stands  for  a  certain  attribute 

of  Pj-s,  e.g.,  color,  size, 
tecture,  etc. 

#P-attribute..-A       -  denotes  a  new  descriptor  inter- 
preted as  the  'number  of  P-^-s  (e.g., 
parts)  with  attribute   equal  A'. 

Example: 

V [color (P1)=RED] [color (P2)=RED] [color (P3)=BLUE] : : >K 

j<  [//P -color- red=2]     :  :>  K 

(The  above  is  a  generalization  rule,  because  a  set  of  oMects  with  any 
two  red  parts  is  a  superset  of  a  set  of  objects  with  two  parts  which  are 
red  and  one  part  which  is  blue) 
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(viii)  the  generating  chain  properties   rule 

If  the  arguments  of  different  occurrences  of  the  same 

relation  (e.g.,  relation  'above',  'left-of,  'next1, 

etc.)  form  a  chain,  i.e.,  are  linearly  ordered  by  the 

relation,  the  rule  generates  descriptors  relating  to  specific 

objects  in  the  chain  and  computes  their  properties  as 

potentially  relevant  characteristics.  For  example: 

LST-object  -  the  'least  object',  i.e.,  the  object  at  the 
beginning  of  the  chain  (e.g.,  the  bottom 
object  in  the  case  of  relation  'above') 

MST-object  -  the  object  at  the  end  of  the  chain  (e.g., 
the  top  object) 

ith-object  -  the  ith  object  of  the  chain. 

5.  4  The  preference  criterion 

The  preference  criterion  defines  what  is  the  desired  solution 
to  the  problem,  i.e.,  what  kind  of  hypotheses  are  being  sought.   The 
question  of  what  should  be  the  preference  criterion  is  a  broad  subject 
beyond  the  scope  of  the  paper.  We  will,  therefore,  discuss  here  only 
the  underlying  ideas  behind  the  presented  approach.   First,  we  disagree 
with  many  authors  who  seem  to  be  searching  for  one  universal  criterion 
which  should  guide  induction.   Our  position  is  that  there  are  many  di- 
mensions, independent  and  interdependent,  on  which  the  hypotheses  can 
be  evaluated.   The  weight  given  to  each  dimension  depends  on  the  ultimate 
use  of  the  hypotheses.  Among  these  dimensions  are  various  forms  of 
simplicity  of  the  hypothesis  (e.g.,  the  number  of  operators  in  it,  the 
quantity  of  information  required  to  encode  the  hypothesis  using  operators 
from  an  a  priori  defined  set  [26],  etc.),  the  scope  of  the  hypo thesis, which 
relates  the  events  predicted  by  the  hypothesis  to  the  events  actually 
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observed  (e.g.,  the  'degree  of  generalization'  [12],  the  'precision'  [26]), 
the  cost  of  measuring  the  descriptors  in  the  hypothesis,  etc.  Therefore, 
instead  of  defining  a  specific  criterion,  we  specify  only  a  general  form 
of  the  criterion.   The  form  permits  a  user  to  define  various  specific 
criteria  to  the  inductive  program,  which  are  appropriate  to  the  application.  The 
form,  called  a  'lexicographic  functional'  consists  of  an  ordered  list  of 
criteria  (of  dimensions  of  hypothesis  quality)  and  a  list  of  'tolerances' 
for  these  criteria  [12,  23]. 

An  important  and  somewhat  surprising  property  of 

such  an  approach  is  that  by  properly  defining  the  preference  criterion, 
the  same  computer  program  can  produce  either  the  characteristic   or  dis- 
criminant  descriptions  of  object  classes.   The  characteristic 
description   specifies  the  common  properties  shared  by  the  objects  of  the 
same  class  (most  work  on  induction  considers  only  this  type  of  descriptions, 
e.g.,  [2],  [5],  [18]),  while  the  discriminant  description   specifies  only 
the  properties  necessary  for  distinguishing  the  given  class  from  all  the 
other  classes  (Michalski  [12,  27],  Larcon  [23]). 

5.5  Arithmetic  descriptors 

In  addition  to  initial  linear  descriptors  used  in  the  data  rules, 
new  linear  descriptors  can  be  formulated  as  arithmetic  functions  of  the 
original  ones.   These  descriptors  are  formulated  by  a  human  expert  as 
suggestions  to  the  program. 

6.   OUTLINE  OF  ALGORITHM  AND  OF  COMPUTER  IMPLEMENTATION' 

In  this  section  we  outline  the  top  level  algorithm  for  rule 
induction  and  its  implementation  in  the  computer  program  INDUCE-1.1 
([23] [24] [25]).  The  algorithm  is  illustrated  by  an  example. 
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INDUCE-1.1  is  considered  to  be  only  an  aid   to  rule  induction. 
Its  successful  application  to  practical  problems  requires  a  cooperation 
between  the  program  and  an  expert,  whose  role  is  to  formulate  data  rules  and 
the  problem  environment  rules,  define  the  preference  criterion  and  other 
parameters,  evaluate  the  obtained  rules,  repeat  the  process  if  desired,  etc. 


6. 1  Computer  representation  of  VL  decision  rules 

Decision  rules  are  represented  as  graphs  with  labeled  nodes  and 
labeled  directed  arcs.   A  label  on  a  node  can  be: 

a)  a  selector  with  a  descriptor  without  the  argument  list, 

b)  a  logical  operation, 

c)  a  quantifier  form  3  x  or  \/x)  . 

Arcs  link  arguments  with  selectors  or  descriptors,  and  are  labeled  by  0,1,2,.., 
to  specify  the  position  of  an  argument  in  the  descriptor  indicated  at  the  head 
of  the  arc  (0  indicates  that  the  order  of  arguments  is  not  important) . 

Several  different  types  of  relations  may  be  represented  by  an  arc. 
The  type  of  relation  is  determined  by  the  label  on  the  node  at  each  end  of 
the  arc.   The  types  of  relations  are:   1)  functional  dependence,  2)  logical 
dependence,  3)  implicit  variable  dependence,  4)  scope  of  variables. 

Figure  2  gives  a  graph  representing  a  VL?1  expression.   The  two 
arcs  connected  to  the  logical  operation  (A)  represent  the  logical  dependence 
of  the  value  of  the  formula  on  the  values  of  the  two  selectors.   The  other 
arcs  in  the  figure  represent  the  functional  dependence  of  f  on  x  and  x„, 
and  g  on  x_ . 


[f   «  1] 


3x, 


[g  -   2] 


VL 


Graph  Structure:    3xx  x^ff^.x^   -  l][g(x2>   -  2]) 


Figure   2 
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6.2.   Outline  of  the  Top  Level  Algorithm 

The  implementation  of  the  inductive  process  in  the  program  INDUCE-1 
was  based  on  ideas  and  algorithms  adopted  from  the  earlier  research  on  the 
generalization  of  VL  expressions  (Michalski  [12,27]  ,  and  some  new  ideas 
and  algorithms  developed  by  Larson  [23,24]. 

The  top  level  algorithm  (in  somewhat  simplified  form)  can  be 
described  as  follows: 

1.  At  the  first  step,  the  data  rules  (whose  condition  parts  are  in  the 
disjunctive  simple  forms)  are  transformed  to  a  new  set  of  rules,  in  which 
condition  parts  are  in  the  form  of  a -expressions .   A  c-expression   (a 
conjunctive  expression)    is  a  product  of  selectors  accompanied  by  one  or 
more  quantifier  forms,  i.e.,  forms  QFx..  x_  ...,  where  QF  denotes  a 
quantifier.   (Note,  that  due  to  the  use  of  the  internal  disjunction  and 
quantifiers,  a  c-expression  represents  a  more  general  concept  than  a 
conjunction  of  predicates  (used,  e.g.,  in  [18] [19]). 

2.  A  decision  class  is  selected,  say  K  t   and  all  c-expressions  associated 
with  this  class  are  put  into  a  set  Fl,  and  all  remaining  c-expressions 
are  nut  into  a  set  FO  (  the  set  Fl  represents  events  to  be  covered   , 

and  set  FO  represents  constraints,  i.e.,  events  not  to  be  covered   ). 

3.  By  application  of  inference  rules  (describing  the  problem  environment), 
constructive  generalization  rules,  and  rules  generating  arithmetic 
descriptors  (sec. 5.  5),  new  selectors  are  generated.   The  'most  promising' 
selectors  (according  to  a  certain  criterion)  are  added  to  the  c-expressions 
in  Fland  FO. 

4.  A  c-expression  is  selected  from  Fl,  and  a  set  of  consistent  generalizations 
(a  restricted  star)    of  this  expression  is  obtained.  This  is  done  by  starting 
with  single  selectors  (called  'seeds'),  selected  from  this  c-expression 

as  the  'most  promising'  ones  (according  to  the  preference  criterion).  In  each 
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subsequent  next  step, a  new  selector  is  added  to  the  c-expression  obtained  in 
the  previous  step  (initially  the  seeds) ,  until  a  specified  number  (parameter 
NCONSIST)  of  consistent  generalizations  is  determined.      Consistency  is 
achieved  when  a  c-expression  has  NULL  intersection  with  the  set  FO.   This 
'rule  growing1  process  is  illustrated  in  fig.  3. 

5.  The  obtained  c-expressions,  and  c-expressions  in  FO,  are  transformed 
to  two  sets  El  and  EO,  respectively,  of  VL  events  (i.e.,  sequences  of 
values  of  certain  discrete  variables) . 

A  procedure  for  generalizing  VL-  descriptions  is  then  applied 
to  obtain  the  'best  cover'  (according  to  a  user  defined  criterion)  of  set 
El  against  EO  (the  procedure  is  a  version  of  AQVAL/1  program  [12]). 

During  this  process,  the  extension  against 3    the  closing 
the  interval   and  the  climbing  generalization  tree   rules  are  applied. 

The  result  is  transformed  to  a  new  set  of  c-expressions 
(a  restricted  star)  in  which  selectors  have  now  appropriately  generalized 
references. 

6.  The  'best'  c-expression  is  selected  from  the  restricted  star. 

7.  If  the  c-expression  completely  coveisFl,  then  the  process  repeats  for 
another  decision  class.   Otherwise,  the  set  Fl  is  reduced  to  contain  only  the 

uncovered  c-expressions,  and  steps  4  to  7  are  repeated. 

The  implementation  of  the  inductive  process  in  INDUCE- 1.1  consists 
of  a  large  collection  of  specialized  algorithms,  each  accomplishing  certain 
task  .  Among  the  most  important  tasks  are: 

1.  the  implementation  of  the  'rule  growing  process' 

2.  testing  whether  one  c-expression  is  a  generalization  of 
('covers')  another  c-expression.   This  is  done  by  testing  for  subgraph 
isomorphism. 
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-  a  disgarded  c-rule 

-  an  active  c-rule 

-  a  terminal  node  denoting  a  consistent  c-rule 

Each  arc  represents  an  operation  of  adding  a  new  selector  to  a  c-rule 


The  branching  factor  is  determined  by  parameter  ALTER.   The 
number  of  active  rules  (which  are  maintained  for  the  next  step  of  the 
rule  growing  process)  is  specified  by  parameter  MAXSTAR.   The  number  of 
terminal  nodes  (consistent  generalizations)  which  program  attempts  to 
generate  is  specified  by  parameter  NCONSIST. 

Illustration  of  the  rule  growing  process 
(an  application  of "the  dropping  selector  rule  in  the  reverse  order) 


Figure  3 


-31- 


3.  generalization  of  a  c-expression  by  extending  the  selector 
references  and  forming  irredundant  c-expressions  (includes  application 
of  AQVAL/1  procedure) . 

4.  Generation  of  new  descriptors  and  new  selectors. 
Program  INDUCE  1.1  has  been  implemented  in  PASCAL  (for  Cyber 

175  and  DEC  10);  its  complete  description  is  given  in  [25]. 

6.3.   Example 

We  will  present  now  an  example  illustrating  some  of  the  features 
of  INDUCE-1. 1. Suppose  given  are  two  sets  of  trains,  Eastbound  and  Westbound, 
as  shown  in  fig.  4.  The  problem  is  to  determine  a  concise  (logically 
sufficient)  description  of  each  set  of  trains,  which  distinguishes  one  set 
from  the  other  (i.e.,  a  discriminant  description  which  contains  only  necessary 
conditions  for  distinguishing  between  the  two  sets) . 

As  the  first  step,  an  initial  set  of  descriptors  is  determined 
for  describing  the  trains.   Eleven  descriptors  are  selected  in  total. 
Among  them: 

•  infront(car . ,car .)    -  oar.    is  in  front  of  oar. 

J        (a  nominal  descriptor) 

•  length (car.)  -  the  length  of  oar. 

(a.  linear  descriptor) 

•  car- shape (car .)       -  the  shape  of  oar. 

(a  structured  descriptor  with  12  nodes  in  the 


generalization  tree;  see  eqs.  (13)  and  (14)) 


cont- load (car ., load .)  -  oar.    contains  load 


i     J 


(a  nominal  descriptor) 


load -shape (load.)     -  the  shape  of  load. 

(a  structured  descriptor) 
The  value  set: 

•  circle 

•  hexagon — 

•   t  ~ZZ^=5**  polygon 

•  triangle—— 

.  rectangle 
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1.  EASTBOUND  TRAINS 
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Figure  4 
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•  nrpts-load(car  )   -  the  number  of  parts  in  the  load  of  oar 

(a  linear  descriptor) 

•  nrwheels(car  )    -  number  of  wheels  in  oar 

(a  linear  descriptor) 

The  data  rules   consist  of  descriptions  of  the  individual 
trains  in  terms  of  the  selected  descriptors,  together  with  the 
specification  of  the  train  set  they  belong  to.   For  example,  the  data 
rule  describing  the  second  eastbound  train  is: 
Scar  ,car  ,car  ,car  , load  ,load9, . . . 
[infront(car1,car2) ] [infront(car2>car^]. . . [ leng th ( car  )= long]  A 

[car-shape(cari)=eneine][car-shaDe(car2)=TT-shaned]rCont-load(car2,load1)]A   (12) 
[load-shape (load  )=triangle] . . . [nrwheels(car  ) ] . .  : :>[class=  Eastbound] 

Rules  describing  the  problem  environment  in  this  case  are  only 
rules  defining  structures  of  structured  descriptors  (arguments  of  descriptors 
are  omitted) : 
[car-shape=open  rctngl,open  trapezoid, U-shaped, dbl  open  rctngl]=*  (13) 

[car-shape=open  top] 
[car-shape=ellipse, closed  rctngl, jagged  top, sloping  top]=*[car-shape=closed  top] (14) 
[load-shape=hexagon, triangle, rec tangle ]=>[load-shape=polygon]  (15) 

The  criterion  of   preference   was  to  minimize  the  number  of  rules 
(c-expressions)  in  describing  each  class,  and,  with  secondary  priority, 
to  minimize  the  number  of  selectors  in  each  rule. 

Rules  of  constructive  generalization  included  in  the  program  are 
able  to  construct,  among  other  descriptors,  such  descriptors  as  the  length 
of  a  chain,  properties  of  elements  of  a  chain,  number  of  objects  satisfying  a 
certain  relation,  etc.   For  example,  from  the  data  rule  (12)  ,  the  constructive 
generalization  rules  can  produce  new  selectors  such  as: 


* 

At  this  moment,  before  proceeding  further,  the  reader  is  advised  to 

look  at  the  pictures  and  to  try  to  solve  this  problem  on  his/her  own. 
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[nrcars=4]  -  the  number  of  cars  in  the  train  is  4 

(the  length  of  chain  defined  by  relation 
infront) 

[nrcars-length-long=l]   -  the  number  of  long  cars  is  1  (the  engine) 

[nr-pts-load (last-car) =2]-  the  number  of  parts  in  the  load  of  the  last 

car  is  2 

[position(car  )=i]       -  the  position  of  oar,    is  i 

Suppose  that  eastbound  trains  are  considered  first.  The 

set  Fl  contains  then  all  c-expressions  describing  eastbound  trains, 

and  FC,all  c-expressions  describing  westbound  trains.   The  description 

e   is  selected  from  Fl  (suppose  it  is  the  above  description  of  the  second 

eastbound  train),  and  supplemented  by  'most  promising'  metadescriptors 

generated  by  problem  environment  rules  and  constructive  generalization 

rules.   In  this  case,  the  metaselector  [shape (last-car)=rectangle]  is  added 

to  e.      Next,  a  set  G  (a  restricted  star)   of  certain  number  (NCONSIST)  of 

consistent  generalizations  of  e   is  determined. 

This  is  done  by  forming  a  sequence  of  partial  stars    (a  partial 

star   may  include  inconsistent  generalizations  of  e)  .   If  an  element  of  a 

partial  star  is  consistent,  it  is  placed  into  the  set  G.  The  initial 

partial  star  (P  )  contains  the  set  of  all  selectors  of  e  .   This  partial 

star  and  each  subsequent  partial  star  is  reduced  according  to  a  user 

specified  preference  criterion  to  the  'best'  subset,  before  a  new  partial 

star  is  formed.  The  size  of  the  subset  is  controlled  by  a  parameter  called 

MAXSTAR.  A  new  partial  star  P.  -  is  formed  from  an  existing  partial  star 

P.  in  the  following  way:   for  each  c-expression  in  P.,  a  set  of  c-expressions 

is  placed  into  P..,»  each  new  c-expression  containing  the  selectors  of  the 

original  c-expression  plus  one  new  selector  from  es   which  is  not  in  the  original 

c-expression.   Once  a  sufficient  number  of  consistent  generalizations  have  been 

formed,  a  version  of  the  AQVAL/1-  program  (Michalski  [12])  is 
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applied  to  extend  the  references  of  all  selectors  in  each  consistent 
generalization.   As  the  result,  some  selectors  may  be  removed  and  some 
may  have  more  general  references. 

In  the  example,  the  best  subset  of  selectors  of  e   (i.e.,  the 
reduced  partial  star  (P  )  )  was: 

Bear.,  [car-shape (car  )=U-shaped]  (16) 

3car[ car-shape (car  )=open  trapezoid]  (17) 

3car  [ car-shape (car  )=  rectangle]  (18) 

[car-shape (last-car) =rec tangle]  (19) 

The  last  c-expression  is  consistent  (has  empty  intersection  with 
c-expressions  in  FO)  and,  therefore,  is  placed  in  G.   From  the  remaining, 
a  new  partial  star  is  determined.   This  new  partial  star  contains  a 
consistent  generalization: 

3car  [car-shape(car  )=rectangle] [length(car  )=short]  (20) 

which  is  added  to  G.   Suppose  G  is  restricted  to  have  only  two  elements 
(NC0NSIST=2) .   Now,  the  program  AQVAL/1  is  applied  to  generalize  references 
of  the  selectors  in  c-expressions  of  G,  if  it  leads  to  an  improvement 
(according  to  the  preference  criterion) . 

In  this  case,  a  generalization  of  (20)  produces  a  consistent  and 
complete  generalization: 

Scar.,  [car-shape (car-  )=closed  top]  [length(car  )=short]  (21) 

(the  generalization  of  (19),  [car-shape(last-car)-polygon] ,  is  not 
complete;  it  does  not  cover  all  Fl) . 

In  this  example,  only  2  partial  stars  were  formed,  and  two 
consistent  generalizations  were  created.   In  general,  a  set  of  consistent 
generalizations  is  created  through  the  formation  of  several  partial  stars. 
The  size  of  each  partial  star  and  the  number  of  alternative  generalizations 
are  controlled  by  user  supplied  parameters. 
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Assuming  a  larger  value  of  NCONSIST,  and  applying  the  above 

procedure  to  both  decision  classes,  the  program  INDUCE- 1.1  produced  the 

following  alternative   descriptions  of  each  set  of  trains: 

(The  selectors  or  references  underlined  by  a  dotted  line  were 
generated  by  application  of  constructive  generalization  rules  or  problem 
environment  rules) . 

Eastboud  trains: 

Scar  [length(car  )=short] [car-shape(car1 )=closed  top] : :> [class=Eastbound] 

(the  same  as  (21)). It  can  be  interpreted: 

If  a  train  contains  a  car  which  is  short  and  has  a  closed  top, 
then  it  is  an  eastbound  train. 


3car1 , car2 , load1 , load2  [ inf ront (car^ car2) ] [cont-load (car  , load  ) J 

,\   [  coat-load  (car2 ,  load^  ]  [  load  -shape  (load  )=triangle] 

A  [load-shape (load 2)=£olv£onJ   : :>   [class=Eastbound]   (23) 

It  can  be  interpreted: 

If«  a  train  contains  a  car  whose  load  is  a  triangle,  and  the  load  of  the 
car  behind  is  polygon,  then  the  train  is  eastbound. 

Westbound  trains: 
[nrcars=3]  V  acar1[car-shape<car1)=jagged-top]  ::>  [class=Westbound]         (2^) 

^car.  [ nr cars- lengthy long=2]  [Eosition^car-2=3j  [shape(car1  )=op_en-top,. jagged-top] 

::>  [class= Westbound]  (25) 

It  is  interesting  to  note  that  the  example  was  constructed  with 
rules  (23)  and  (24)  in  mind.   The  rule  (22)  found  by  the  program  as  an 
alternative  was  rather  surprising  because  it  seems  to  be  conceptually 
simpler  than  rule  (23) .  This  shows    that  the  combinatorial  part  of 
an  induction  process  can   be  successfully  handled  by  a  computer 
program,  and, therefore,  programs  like  the  above  have  a  potential  to 
serve  as  an  aid  to  induction  processes  in  various  applied  sciences. 
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7.   SUMMARY 

We  have  presented  an  approach  to  pattern  recognition  which 
views  it  as   knowledge-guided  computer  induction.   Let  us  briefly  re- 
view the  main  advantages  and  limitations  of  this  approach.   Among  the 
advantages  are  the  generality  of  the  method  and  the  simplicity  of 
interpretation  of  the  pattern  recognition  rules.   More  specifically, 
the  approach: 

takes  into  consideration  three  types  of  descriptors 
(nominal,  linear  and  structured)  and  can  use  descriptors 
of  different  arity  (variables,  n-ary  relations  and 
functions) 

takes  into  consideration  the  properties  of  the  inter- 
relationships of  descriptors,  characteristic  to  the 
recognition  problem  at  hand 

gives  thepossibility  of  defining  (within  limits)  a  pre- 
ference criterion,  measuring  the  quality  of  the 
rules,  that  is  most  suited  to  the  application 

has  an  ability  to  generate  new  descriptors  ('metadescriptors' ) 
and  blend  them  smoothly  with  the  initial  ones  to  provide 
a  basis  from  which  the  final  description  chooses  its  most 
appropriate  descriptors 

provides  uniformity  of  the  representation  of  initial  and 
final  descriptions  (i.e.,  in  terms  of  VL  rules)  and  of 
inference  and  generalization  rules 

permits  the  person  stating  the  problem  to  suggest  various 
arithmetic  transformations  of  the  original  (linear)  vari- 
ables which  look  promising  as  relevant  characterization  of 
obj  ect  classes  • 

Among  major  limitations  of  the  presented  work  is  a  quite 
limited  form  of  expressing  initial  and  final  descriptions  (i.e.,  in  the 
form  of  a  disjunctive  simple  VL_.  expressions),  and  a  restricted  number 
of  operators  the  program  (implementing  the  approach)  understands  and  uses 
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in  inducing  descriptions.   Another  limitation  is  that  the  program  does 
not  dif ferenciate  among  possible  types  of  linear  descriptors  (e.g., 
ordinal,  interval,  ratio  and  absolute).   Also,  it  does  not  take  into 
consideration  any  probabilistic  information,  nor  it  is  able  to  auto- 
matically search  for  appropriate  algebraic  transformations.   These 
limitations  do  not  seem,  however,  to  be  inherent  to  the  approach. 

Also,  the  questions  pertinent  to  the  computational  efficiency 
of  algorithms  used  have  not  been  investigated. 
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